Finding document topics for improving topic segmentation
نویسنده
چکیده
Topic segmentation and identification are often tackled as separate problems whereas they are both part of topic analysis. In this article, we study how topic identification can help to improve a topic segmenter based on word reiteration. We first present an unsupervised method for discovering the topics of a text. Then, we detail how these topics are used by segmentation for finding topical similarities between text segments. Finally, we show through the results of an evaluation done both for French and English the interest of the method we propose.
منابع مشابه
یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملA Dynamic Topic Model for Document Segmentation
Factor language models, like Latent Semantic Analysis, represent documents as mixtures of topics, and have a variety of applications. Normally, the mixture is computed at the whole-document level, that is, the entire document contains material on several topics, without specifying where they occur in the document. In this paper, we describe a new model which computes the topic mixture estimate ...
متن کاملImproving Text Segmentation by Combining Endogenous and Exogenous Methods
Topic segmentation was addressed by a large amount of work from which it is not easy to draw conclusions, especially about the need for knowledge. In this article, we propose to combine in the same framework two methods for improving the results of a topic segmenter based on lexical reiteration. The first one is endogenous and exploits the distributional similarity of words in a document for di...
متن کاملTopic Modeling in Financial Documents
This paper describes the application of topic modeling techniques to quarterly earnings call transcripts of publicly traded companies. Earnings call transcripts represent an interesting case for analysis because the document is relatively unstructured and potentially more informative than 10K and 10Q disclosures due to the question and answer session consisting of unprepared statements. This pa...
متن کاملSampling Table Configurations for the Hierarchical Poisson-Dirichlet Process
•Discrete hierarchies are ubiquitous in intelligent systems. • The Poisson-Dirichlet process (PDP ) [1] allow statistical inference and learning on discrete hierarchies, e.g., hierarchy of Dirichlet distributions. • Applications of the PDP/HPDP include but not limited to: – Topic modeling: Finding meaningful topics discussed in large set of documents. Beneficial to automatic document analysis a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007